Dissimilarity measures for histogram-valued data and divisive clustering of symbolic objects
نویسنده
چکیده
Contemporary datasets are becoming increasingly larger and more complex, while techniques to analyse them are becoming more and more inadequate. Thus, new methods are needed to handle these new types of data. This study introduces methods to cluster histogram-valued data. However, histogram-valued data are difficult to handle computationally because observations typically have a different number and length of subintervals. Thus, a transformation for histogram data is proposed as a technique for handling them more easily computationally. From this technique, three new dissimilarity measures for histogram data are proposed. Then, how the monothetic clustering algorithm based on Chavent (1998, 2000) can be extended to histogram data is shown, and a polythetic clustering algorithm for symbolic objects is developed (based on all p variables). Validity criteria to aid in the selection of the optimal number of clusters are described and verified by some simulation studies. The new methodology is illustrated on a large dataset collected from the US Forestry Service. Index words: Symbolic data, Histogram-valued data, Dissimilarity measure, Monothetic algorithm, Polythetic algorithm, Validity Dissimilarity Measures for Histogram-valued Data and Divisive Clustering of Symbolic Objects
منابع مشابه
Relational partitioning fuzzy clustering algorithms based on multiple dissimilarity matrices
This paper introduces fuzzy clustering algorithms that can partition objects taking into account simultaneously their relational descriptions given by multiple dissimilarity matrices. The aim is to obtain a collaborative role of the different dissimilarity matrices to get a final consensus partition. These matrices can be obtained using different sets of variables and dissimilarity functions. T...
متن کاملA New Approach to Detect Congestive Heart Failure Using Symbolic Dynamics Analysis of Electrocardiogram Signal
The aim of this study is to show that the measures derived from Electrocardiogram (ECG) signals many a time perform better than the same measures obtained from heart rate (HR) signals. A comparison was made to investigate how far the nonlinear symbolic dynamics approach helps to characterize the nonlinear properties of ECG signals and HR signals, and thereby discriminate between normal and cong...
متن کاملA New Approach to Detect Congestive Heart Failure Using Symbolic Dynamics Analysis of Electrocardiogram Signal
The aim of this study is to show that the measures derived from Electrocardiogram (ECG) signals many a time perform better than the same measures obtained from heart rate (HR) signals. A comparison was made to investigate how far the nonlinear symbolic dynamics approach helps to characterize the nonlinear properties of ECG signals and HR signals, and thereby discriminate between normal and cong...
متن کاملDivisive Monothetic Clustering for Interval and Histogram-valued Data
We consider some classically based methods for fitting a multiple regression model to intervalvalued data (de Carvalho et al., 2004; Lima Neto et al., 2005; Lima Neto and de Carvalho, 2010). Then, a so-called symbolic model is fitted where now the regression parameters are estimated by using the symbolic sample covariance and variance functions of Billard (2008) and Bertrand and Goupil (2000). ...
متن کاملAnalysis of Distribution Valued Dissimilarity Data
We deal with methods for analyzing complex structured data, especially, distribution valued data. Nowadays, there are many requests to analyze various types of data including spatial data, time series data, functional data and symbolic data. The idea of symbolic data analysis proposed by Diday covers a large range of data structures. We focus on distribution valued dissimilarity data and multid...
متن کامل